{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Matrix Multiplication\n",
    "\n",
    "This tutorial demonstrates how to perform matrix multiplication operations using TT-NN, showcasing different memory configurations and layout conversions.\n",
    "We'll explore how to create random tensors on device, perform matrix multiplication, and configure operations for optimal performance on Tenstorrent hardware.\n",
    "\n",
    "## Import Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ttnn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Open the Device\n",
    "\n",
    "Create a device to run our matrix multiplication operations Device ID 0 typically refers to the first available Tenstorrent accelerator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "device_id = 0\n",
    "device = ttnn.open_device(device_id=device_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tensor Configuration\n",
    "\n",
    "Set up dimensions for our matrix multiplication: A(m×k) × B(k×n) = C(m×n). Using 1024×1024 matrices for this example (32×32 tiles, with 32 tiles per dimension)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m = 1024  # Number of rows in matrix A and result\n",
    "k = 1024  # Number of columns in A / rows in B (must match for valid matmul)\n",
    "n = 1024  # Number of columns in matrix B and result"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialize tensors a and b with random values\n",
    "\n",
    "Create random tensors directly on the device using `TILE_LAYOUT`. `TILE_LAYOUT` is optimized for Tensix cores which operate on 32×32 tiles. Using `bfloat16` for efficient computation with good numerical range"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = ttnn.rand((m, k), dtype=ttnn.bfloat16, device=device, layout=ttnn.TILE_LAYOUT)\n",
    "b = ttnn.rand((k, n), dtype=ttnn.bfloat16, device=device, layout=ttnn.TILE_LAYOUT)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Matrix multiply tensor a and b\n",
    "\n",
    "Perform matrix multiplication using the @ operator. This is equivalent to `ttnn.matmul` with default settings.\n",
    "\n",
    "The operation will run longer the first time because the kernels need to get compiled."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output = a @ b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Re-running the operation shows significant speed up by utilizing program caching"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "output = a @ b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspect the layout of matrix multiplication output\n",
    "\n",
    "Print the current layout of the output tensor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print(output.layout)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As can be seen, matrix multiplication produces outputs in a tile layout. That is because it's much more efficient to use this layout for computing matrix multiplications on Tenstorrent accelerators compared to a row-major layout.\n",
    "\n",
    "And this is also why the logs show 2 tilize operations, as the inputs get automatically convered to the tile layout if they are in a row-major layout.\n",
    "\n",
    "Learn more about tile layout [here](https://github.com/tenstorrent/tt-metal/blob/main/tech_reports/tensor_layouts/tensor_layouts.md#32-tiled-layout)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspect the result of the matrix multiplication\n",
    "\n",
    "To inspect the results we will first convert to row-major layout."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output = ttnn.to_layout(output, ttnn.ROW_MAJOR_LAYOUT)\n",
    "\n",
    "print(\"Printing ttnn tensor\")\n",
    "print(f\"shape: {output.shape}\")\n",
    "print(f\"chunk of a tensor:\\n{output[:1, :32]}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Matrix multiply tensor a and b by using more performant config\n",
    "By default, matrix multiplication might not be as effecient as it could be. To speed it up further, the user can specify how many cores they want matrix multiplication to use. This can speed up the operation significantly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = ttnn.rand((m, k), dtype=ttnn.bfloat16, device=device, layout=ttnn.TILE_LAYOUT, memory_config=ttnn.L1_MEMORY_CONFIG)\n",
    "b = ttnn.rand((k, n), dtype=ttnn.bfloat16, device=device, layout=ttnn.TILE_LAYOUT, memory_config=ttnn.L1_MEMORY_CONFIG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run once to compile the kernels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output = ttnn.matmul(a, b, memory_config=ttnn.L1_MEMORY_CONFIG, core_grid=ttnn.CoreGrid(y=8, x=8))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Enjoy a massive speed up on the subsequent runs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output = ttnn.matmul(a, b, memory_config=ttnn.L1_MEMORY_CONFIG, core_grid=ttnn.CoreGrid(y=8, x=8))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Close the device"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ttnn.close_device(device)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Full Example and Output\n",
    "\n",
    "Lets put everything together in a complete example that can be run directly.\n",
    "\n",
    "[ttnn_add_tensors.py](https://github.com/tenstorrent/tt-metal/blob/main/ttnn/tutorials/basic_python/ttnn_add_tensors.py)\n",
    "\n",
    "Running this script will generate the following output:\n",
    "\n",
    "```console\n",
    "$ python3 $TT_METAL_HOME/ttnn/tutorials/basic_python/ttnn_basic_matrix_multiplication.py\n",
    "2025-10-23 09:03:21.386 | info     |          Device | Opening user mode device driver (tt_cluster.cpp:209)\n",
    "2025-10-23 09:03:21.512 | info     |             UMD | Harvesting mask for chip 0 is 0x20 (NOC0: 0x20, simulated harvesting mask: 0x0). (cluster.cpp:394)\n",
    "2025-10-23 09:03:21.751 | info     |             UMD | Opening local chip ids/PCIe ids: {0}/[2] and remote chip ids {} (cluster.cpp:252)\n",
    "2025-10-23 09:03:21.751 | info     |             UMD | All devices in cluster running firmware version: 18.10.0 (cluster.cpp:232)\n",
    "2025-10-23 09:03:21.751 | info     |             UMD | IOMMU: disabled (cluster.cpp:174)\n",
    "2025-10-23 09:03:21.751 | info     |             UMD | KMD version: 2.4.0 (cluster.cpp:177)\n",
    "2025-10-23 09:03:21.752 | info     |             UMD | Software version 6.0.0, Ethernet FW version 7.0.0 (Device 0) (cluster.cpp:1085)\n",
    "2025-10-23 09:03:21.765 | info     |             UMD | Pinning pages for Hugepage: virtual address 0x7f5480000000 and size 0x40000000 pinned to physical address 0x4c0000000 (pci_device.cpp:536)\n",
    "Layout.TILE\n",
    "Printing ttnn tensor\n",
    "shape: Shape([1024, 1024])\n",
    "chunk of a tensor:\n",
    "ttnn.Tensor([[258.0000, 260.0000,  ..., 266.0000, 272.0000]], shape=Shape([1, 32]), dtype=DataType::BFLOAT16, layout=Layout::ROW_MAJOR)\n",
    "2025-10-23 09:03:46.028 | info     |          Device | Closing user mode device drivers (tt_cluster.cpp:426)\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
